Gah - a solution with more questions. – EntropicLqd
Legacy:Package File Format
The Unreal Engine uses a single file format to store all its game-content. You may have seen many different filetypes, like .utx (textures), .unr (maps), .umx (sound) and .u (code), but from a technical standpoint there is no difference between those files; the different file endings are only used to help organize the packages in the directory structure. The following article will describe the basic structure of his fileformat. It omits many details (such as tons of constants, for example), but there’s a good reference available on the net by Antonio Cordero Balcazar (see links).
Assumptions:
This is a rather technical article. It requires you to have a basic understanding of object oriented programming as well as the will to use an hex-editor, if needed. This is NOT intended to be a full documention of the fileformat, but only a brief introduction.
Contents
The Structure of the File[edit]
Overview[edit]
Every package file can be roughly split into three logical parts. The header, the three index tables (name-table, import-table and export-table) and the data itself. But only the header has a fixed position (at offset 0), all other parts can be found anywhere within the file without irritating the engine.
Most of the time, although, the layout looks like the following:
- Header
- Name-Table
- Import-Table
- Data
- Export-Table
It may be useful to read a bit about the concept of serialization, which allows you to (rather) easily store the state of objects within a file. A brief introduction can be found on the Wiki: Package File Format/Serialization
Header[edit]
This global header can be found at the beginning of the file (offset 0). It is the starting point for every operation.
offset | Type | Property | Description |
0 | DWORD | Signature | Always: “0x9E2A83C1”; use this to verify that you indeed try to read an Unreal-Package |
4 | WORD | PackageVersion | Version of the file-format; Unreal1 uses mostly 61-63, UT 67-69; However note that quite a few packages are in use with UT that have Unreal1 versions. see the appendix for more details |
6 | WORD | LicenseMode | This is the license number. Different for each game. |
8 | DWORD | Package Flags | Global package flags, i.e. if a package may be downloaded from a game server etc; described in the appendix |
12 | DWORD | Name Count | No. Of entries in name-table |
16 | DWORD | Name Offset | Offset of name-table within the file |
20 | DWORD | Export Count | No. Of entries in export-table |
24 | DWORD | Export Offset | Offset of export-table within the file |
28 | DWORD | Import Count | No. Of entries in import-table |
32 | DWORD | Import Offset | Offset of import-table within the file |
After the ImportOffset, the header differs between the versions. The only interesting fact, though, is that for fileformat versions => 68, a GUID has been introduced. It can be found right after the ImportOffset: | |||
36 | 16 BYTE | GUID | Unique identifier; used for package downloading from servers |
older package versions have a list of GUIDs (pointed to by the same form of count/offset pair as above) in a seperate section rather than just space for one, tests reveal that ut uses the last one in the list when there is more than one but such packages do not seem to be seen in the wild. |
Index Tables[edit]
The Unreal-Engine introduces two new variable-types. The first one is a rather simple string type, called NAME from now on. The second one is a bit more tricky, these CompactIndices, or INDEX later on, compresses ordinary DWORDs downto one to five BYTEs. Both types, as well as the ObjectReference, are described in the following paper: Package File Format/Data Details
Name-Table[edit]
The first and most simple one of the three tables is the name-table. The name-table can be considered an index of all unique names used for objects and references within the file. Later on, you’ll often find indexes into this table instead of a string containing the object-name.
Type | Property | Description |
NAME | Object Name | |
DWORD | Object Flags | Flags for the object; described in the appendix |
Export-Table[edit]
The export-table is an index for all objects within the package. Every object in the body of the file has a corresponding entry in this table, with information like offset within the file etc.
Type | Property | Description |
INDEX | Class | Class of the object, i.e. ‘Texture’ or ‘Palette’ etc; stored as a ObjectReference |
INDEX | Super | Object Parent; again a ObjectReference |
DWORD | Group | Internal package/group of the object, i.e. ‘Floor’ for floor-textures; ObjectReference |
INDEX | Object Name | The name of the object; an index into the name-table |
DWORD | Object Flags | Flags for the object; described in the appendix |
INDEX | Serial Size | Total size of the object |
INDEX | Serial Offset | Offset of the object; this field only exists if the SerialSize is larger 0 |
Import-Table[edit]
The third table holds references to objects in external packages. For example, a texture might have a DetailTexture (which makes for the nice structure if have a very close look at a texture). Now, these DetailTextures are all stored in a single package (as they are used by many different textures in different package files). The property of the texture object only needs to store an index into the import-table then as the entry in the import-table already points to the DetailTexture in the other package.
Type | Property | Description |
INDEX | Class Package | Package file in which the class of the object is defined; an index into the name-table |
INDEX | Class Name | Class of the object, i.e. ‘Texture’, ‘Palette’, ‘Package’, etc; an index into the name-table |
DWORD | Package | Reference where the object resides; ObjectReference |
INDEX | Object Name | The name of the object; an index into the name-table |
Body/Object[edit]
Each object consists of a list of properties at the beginning and the actual object itself.
Object Properties[edit]
When jumping to the offset of an object, you'll first be confronted with the object properties before the actual object starts. The format is rather straightforward. The first byte is an INDEX-type reference into the Name-Table, giving you the property's name. The second byte does the magic of telling you what kind of data follows; for example 0x02 flags a DWORD sized integer type. Then comes the actual property-data. The procedure repeats itself until the reference into the Name-Table returns 'None' (case insensitive) as the name.
That said, there are some bit-tricks to deal with arrays, booleans and such. For more info on these, as well as a full list of info-bytes, read Antonio's package docs.
Sample Objects (Texture Class)[edit]
After the properties are finished the object starts. It basically consists of a predefined set of properties. As an example, the texture class (for good old UT) will be explained below. The texture class is a native one, which means that it doesn't have a generic header in addition to its own data. The layout looks like this:
Type | Property | Description |
BYTE | MipMapCount | Count of MipMaps in object |
The next set of variables repeats itself for each MipMap.
Type | Property | Description |
DWORD | WidthOffset | Offset in file; should be the same as SerialOffset in the Export-Table. Only if PkgVer >= 63 |
INDEX | MipMapSize | Size of the image data (in bytes) |
n BYTEs | MipMapData | Image data; one byte per pixel; n = MipMapSize |
DWORD | Width | Texture-width |
DWORD | Height | Texture-height |
BYTE | BitsWidth | Number of bits of Width (e.g. 10 for 1024 pixels) |
BYTE | BitsHeight | Number of bits of Height (e.g. 10 for 1024 pixels) |
Appendix[edit]
A. Links[edit]
- http://www.acordero.org/: _The_ ressource regarding package files. A very detailed reference of the package format, the UT-Package-Tool and a Delphi-unit can be found there.
- http://ut-files.com/index.php?dir=Utilities/&file=utcms_source.zip: A C++ class for reading packages. Totally free for use. [Link updated with new location]
B. Notes[edit]
The last part about the object properties and the texture class was written in a hurry. I'm sorry it took so long for me to finish that piece.
The fileformat itself, btw, has not changed between the versions of UT (except the odd new property and such). Many of the objects however have changed a lot or were replaced by enhanced types (such as my beloved texture class...).
Comments/Discussion[edit]
Jesco: I will continue after here tomorrow. Now it's time for some sleep :)
Mychaeel: Good start. :-) Have a look at UMOD/File Format too if you haven't already. A common thing like the compact index format could move to a shared page, for instance.
Jesco: Ah, I haven't noticed that, yet. Saves me the hassle to explain the compact index ;) Where should the page for the compact index be put to? I suggest making it either a subpage of UMOD/File Format or Package File Format.
Mychaeel: Putting it on a subpage of Package File Format sounds more obvious to me.
Tarquin: Other pages to grab material from / link to / etc:
- Package
- Package redirects to the above.
- UT Package Tool (just a link to a site)
Jesco: Ok, I'll work on it later today when I come back from university. Maybe I should also mail Antonio and ask im if I could post a copy of his reference docs for all those thousands of different objects that I don't have a clue of ;)
Jesco: I haven't forgot about this article, it just went down my priority list, unfortunately.
Diki: Hey Jesco, I dont suppose you seriously havent forgotten about this article. Im trying to find more info about this topic!
RmzVoid: Where I can get codes of Object Properties types?
Diablo: @ anyone who wants to dig deeper inside unreal file format structures: take a look at this project:
http://sourceforge.net/projects/ushock/
@3DBuzz: Can someone Please upload the "UTCMS_source.zip file" to another server? The current link is dead yet interest in reading the packages is still there.
Tarquin: Alternatively, someone could paste the code into a subpage here.
Plugwash: I wan't to make a tool that makes some changes to some of the tables without changing the bulk of the file. Is there any reason not to put the tables at the end of the file after everything else (yes i realise leaving the old tables in means a bit of bloat but it shouldn't be too significant)?
Xian: Well as far as I can remember, the order is: 1. Headers, 2. Linkers, 3. NameTable (+index where it begins to be used), 4. Compiled Code, 5. Decompiled Code (aka Core.TextBuffer). Although a completely rewritten file parser would be able to read it with the NT at the end, I don't see the point. The code uses pointers to each NT element. It is way more logical to say "Name <Pawn> has pointer 4F6G" and later make a reference to it in the Compiled Code, rather than for the code to memorize all used pointers and then parse the end of the file. I'd say the logic here is the same as compiling from end to beginning (if we'd compile from beginning to end, stuff like x = x + 2 or x += 2 wouldn't work, without pre-parsing, I guess). You might be able to add new elements to the table on the condition you change the index they get used at, you include serialization (to not get a serialization error) and modify the NT size and namespace used by linkers (also used by serialization I think). Excuse my raw descriptions, but it should be pretty accurate :)
Wormbo: The locations of the name, import and export tables are specified in the file header, and the locations of other objects in the file are are specified in the export table. Where those tables or objects actually are in the file or in what order they appear is irrelevant, as long as everything is in the location mentioned by the header or export table.
Xian: True. The linker descriptions specify each linker size and its offset (i.e. names, exports and imports), setting classes within the package as exports and used classes of other packages as imports. I do like the way the current order of file contents is done, since it's pretty logical (unlike a random placement), and I guess you could shift them back and forth, but it would be readable only by your tool, so I don't see much of a point.
Side note: thinking of inserting names, I am curious how the Engine would react to finding a name that is never used (although in theory it should be ignored). There is one way to convert a string to a name, but the rule is that the name should exist in the nametable.
Anyway, back on topic, what changes did you have in mind, Plugwash ?
Plugwash If i understand the formats intentions correctly i don't belive it will matter where the table is in the file, but obviously opinion is split here so trial and error is going to be the only way to find out ;). I wan't a string replacer mainly for use in dealing with conflicting packages (two packages with the same name but different contents), to some extent its possible to change strings in place by hand (i've done it before see the workaround i posted for the credits version mismatch issue on the UT troubleshooting page) but this limits you to replacing them with another string of the same length. On the other hand i really really don't wan't to go to the trouble of writing a full package deserialiser and reserialiser.
Plugwash Yep UT doesn't seem to care if the names section is at the end, i'm just trying to clarify the situation with regards guids now ;).
BigBadaBoom: Can anyone direct me to a documentation of the ArrayProperty? I've figured out most stuff I need myself but arrays are still a total puzzle to me. :(
Dimension4: Export table structure is invalid.
Dimension4: Making a Reserializer is quite hard cuz you have to change many offsets:
ImportTable offset
ExportTable offset
All offsets in ExportTable
You got the picture :eek: